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We study a regression characterization for the quadratic estimator of weak lensing, developed by 
Hu and Okamoto [1611171129] , for cosmic microwave background observations. This characterization 
motivates a modification of the quadratic estimator by an adaptive Wiener filter which uses the 
robust Bayesian techniques described in [3, 4, 36 . This technique requires the user to propose a 
fiducial model for the spectral density of the unknown lensing potential but the resulting estimator 
is developed to be robust to misspecification of this model. The role of the fiducial spectral density 
is to give the estimator superior statistical performance in a "neighborhood of the fiducial model" 
while controlling the statistical errors when the fiducial spectral density is drastically wrong. Our 
estimate also highlights some advantages provided by a Bayesian analysis of the quadratic estimator. 



I. INTRODUCTION 

The cosmic microwave background (CMB) measures 
temperature fluctuations in the early Universe some 
400,000 years after the big bang. These fluctuations 
provide us with a picture of the Universe at the in- 
stant of recombination and contains a wealth of infor- 
mation for cosmology and cosmic structure. One im- 
portant characteristic of the observed CMB is that the 
photon paths have been distorted, or lensed, from the 
gravitational effect of intervening matter. Estimating 
this lensing is important for a number of reasons includ- 
ing, but not limited to, understanding cosmic structure, 
constraining cosmological parameters [20 , 34 and detect- 
ing gravity waves [23l [30j . There have been a num- 
ber of proposed estimators for the lensing of the CMB 
[Il|5l|6l[Il[ni[ll[l7l[29]. The most widely used esti- 
mate was developed in [161 HZl HF^ and is referred to as 
'the quadratic estimator'. This estimator, up to leading 
order, is an unbiased minimum variance estimator. In 
this paper we study the potential advantages obtained 
by relaxing the unbiasedness constraint, borrowing sta- 
tistical techniques from Bayesian and regression theory. 

The effect of lensing is to simply remap the CMB, 
preserving surface brightness. Up to leading order, the 
remapping displacements are given by V^, where (j) de- 
notes a lensing potential and is the planer projection 
of a three dimensional gravitational potential (see [7], 
for example). The lensed CMB can then be written 
Q{x + V(j){x)) where Q{x) denotes the unlensed CMB 
temperature fluctuations projected to the observable sky. 
In this paper we work in the small angle limit and use 
a flat sky approximation so that x ^M? . The isotropic 
and Gaussian assumptions for Q{x) translates to inde- 
pendence under the Fourier transform. However, for 
a fixed lensing potential (/), the lensed CMB becomes 
non-isotropic, leading to correlated Fourier modes. The 
quadratic estimator takes advantage of this correlation 



and uses weighted sums of Fourier cross products to un- 
biasedly (up to leading order) estimate the lensing po- 
tential. 

By relaxing the unbiasedness constraint and utiliz- 
ing robust Bayesian techniques (originally developed in 
[31 [36] for statistical regression) we propose an adaptive 
shrinkage adjustment to the nominal quadratic estima- 
tor. The new estimator is given by 
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where (piL) is the regular quadratic estimator at fre- 
quency L G M^; Cff'f^^ is a fiducial spectral density model 
for the gravitational potential ^; 2Al is the approximate 

variance of 4^{L) derived in [17]; and is an adap- 
tive shrinkage factor defined in Section IV B The for- 
mula makes it clear that the estimator is essentially an 
adaptive Wiener filter: shrinking 4^{L) when 274/, /C^"^^ 

is large and retaining 4^{L) when '^^L/Cff'f^^ is small. 

The adaptive shrinkage factor is derived as a poste- 
rior expected value in a robust Bayesian procedure and 
adapts the Wiener filter to account for uncertainty associ- 
ated with misspecification of Cff'f^^. Indeed, the Bayesian 
viewpoint is the principal advantage of the estimate: us- 
ing posterior draws of the lensing potential ^, one can 
construct estimates and quantify the uncertainty for any 
non-linear function of the gravitational potential, includ- 
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ing spectral density estimate. Moreover, in Section [IV B 
we show that the robust Bayesian procedure is easy to 
simulate without resorting to expensive Markov Chain 
Monte Carlo (MCMC). 

The non-informative prior for this Bayesian procedure 
is parameterized by a fiducial spectral density Cff'f^^ for 
the lensing potential (f) and utilizes a hierarchal struc- 
ture to induce robustness. Robustness, in this context, 
pertains to the stability under mis-specification of the 
fiducial spectral density. We illustrate this property 
with simulations that demonstrate the similarity between 
(t){L) and the optimal Wiener filter if one had access to 
the true spectral density for 0, even when the fiducial 
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model is wrong by a relatively large factor (similar ro- 
bustness properties have been demonstrated in [3 ). In 
fact, (j){L) also has a frequentist interpretation that is in- 
dependent of the Bayesian underpinnings. At the end of 
Section |IVB| we discuss the connection with a general- 
ized James-Stein shrinkage estimator [18 , where the role 
of the fiducial spectral density is essentially to specify a 
shrinkage direction. 

It should be noted that our Bayesian posterior is es- 
sentially an approximation since we model the lensing 
operation using the same first order Taylor approxima- 
tion present in the quadratic estimator. Therefore, the 
well known bias in the quadratic estimator, derived in 
[inilll], is still present in the estimate (j){L) and in the 
associated posterior samples. In the context of spectral 
density estimation one may be able to simply subtract 
this bias using the fiducial model C 



L,fid- 



However this 

would require some stability of the bias under misspeci- 
fication of Cff^^^ which is yet to be established. 

The remainder of this paper is organized as follows. 
In Section |n| we establish the connection between regres- 
sion and the quadratic estimator. This connection is then 



used in Section III to derive the optimal Wiener filter of 
the quadratic estimate when the true spectral density 
for (j) is known. In Section [IV| we derive the adaptive 
shrinkage estimator ([T]) using Bayesian techniques when 
the spectral density for is unknown. Finally, in Sec- 
tion |V| we present some simulation examples exploring 
the advantages of the estimator ([T]) and its Bayesian in- 
terpretation. 



II. REGRESSION AND THE QUADRATIC 
ESTIMATOR 



In this section we characterize the quadratic estima- 
tor in terms of generalized least squares regression. The 
main utility of this connection is the incorporation of 
tools from statistical regression that lead to the shrink- 



age estimators given in sections III and IV 



The quadratic estimator is derived under the assump- 
tion that the observed lensed CMB field is contaminated 
by additive noise and an instrumental beam. Through- 
out this paper we let Q{x) denote the observed CMB 
field with lensing, beam (denoted by (f) and noise (de- 
noted N{x)) so that 



e{x) = N{x) 



■V(t){y))Lp{x-y)dy. 



The quadratic estimator is based on a first order Taylor 
approximation in on the lensed CMB field: 0(ic + 
V^(cc)) = 6(x) + V^(cc)-V6(x)+0(^^). By truncating 
0{(j)^) terms one gets the following approximation for the 
correlation of Fourier modes induced by lensing: 



LWY ) « s^cfXt + MmL) (2) 



where = {2TT)-'^[{e + L)-LCf^j^-e-LCf^]^{e + 
L)(f*{i), Cf^ denotes the flat sky power spectrum for 
e and ^ |^(£)|2cf Q 



^ so that Q^e 



expt 



de- 



notes the power spectrum for the unlensed CMB cor- 
rupted with experimental noise and beam. To avoid 
any confusion notice that we are adopting the nota- 
tional convention of [2T (see Section 4.1) so that Cf^ 
is defined by (6(£)6(/)*) = S^-^Cf^ where G{i) = 



/e-^^-^6(x)^ and = J e 



ix-£. 



(27r)2 



Throughout the 



remainder of the paper we let angled brackets (as used in 
([2|) denote expectation over the unlensed CMB and the 
instrumental noise, while holding the potential ^ fixed. 

Equation Q exposes the nonstationarity in the lensed 
temperature field through the cross correlation of Fourier 
modes. The quadratic estimator uses this correlation 
to estimate (l){L) by weighted averaging cross products 
of Q{i) separated at lag L 7^ 0. In particular, let 
k = 1,2,... index a set of frequencies ik ^ I^^- Now, 
for each k define the normalized cross product at lag 
L, vl ]^ = Q{ik + L)Q{ikY / fhi^k) correspond- 
ing vector of these cross products ml = (vi,,i, • • O^- 
Since ^ 0, equation (l2| implies that each VL^k is a noisy 
unbiased estimate of <^L) (up to the approximation in 
([2|). Writing this statement in a regression setting one 
obtains 



(3) 



where for each fixed L, Bl is an error vector and 1 is 
a vector of ones. Approximation ([2| establishes that 
(ql) ~ 0. The generalized least squares regression es- 
timator for (l){L) is then 



(i)^(ltN£il)-iltNiV 



(4) 



where N/, is the covariance matrix of Bl which is approx- 
imated as follows: 



^^fc+L, expt ^-gfc, expt 



The above approximation is obtained from ([2| using 
Wick's theorem and discarding any 0(0) terms. Also no- 
tice that, in practice, the covariance matrix N/, is based 
on a discrete grid approximation in Fourier space that 
arises from finite sky observations of O. In particular. Si 
is approximated as 1/ AL when i = and zero otherwise, 
where AL is the area element of the grid in Fourier space. 
For the remainder of the paper we do not distinguish the 
discrete case versus the continuous and simply equate ^0 
with 1/AL leaving it understood that equality holds in 
the limit as AL 0. 

This is not the typical derivation of the quadratic es- 
timator. However, it should be no surprise they are re- 
lated since both are minimum variance estimators. To 
connect the two, notice that the only reason Sij^-^^.-^l ap- 
pears in N/, is that the terms 6(£ + L)Q{iY / fi{L) and 
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e{i'^L)e{ty/U{L) are the same when ^ + r + L = 0. 
Therefore, if we only include unique observations in v/,, 
Nl becomes diagonal. It is now easy to see that 



L Vi 



1 

2^ 



fLjereje + L)eie)* 

^£+L,expt"-'£,expt 



2<5o J C'^|^,expt^^Spt 



■d£. 



dt (5) 



(6) 



where the equalities are understood to hold in the 
continuous-to-discrete approximation inherent in and 
the above integrals. Notice that the factor of 1/2 comes 
from the fact that the integrals have non-unique terms in 
the integrand. Returning to the original characterization 
of the quadratic estimator developed in [16l[l7], (j) has the 
form Al J^d 6(-^ + L)Q{iYgi{L) di where the weights Q 
must satisfy the constraint gL{^)fL{^) ^ and the nor- 
malizing constant Al is given by A^^ = J gL{^)fL{^) dl 
to ensure unbiasedness. In [16 the optimal weights g are 
derived to be proportional to'/i(€)7[Cf+L,expt^fxpt]- 
Now, plugging in these optimal weights and using ([5| and 
(|6| one immediately gets 



4>{L) ^ (itNi^l)-HtN-iv^ 

= Al [ e{i^L)e{iygL{i)di. 

Moreover, the prediction variance follows easily from 
standard regression theory which gives the following vari- 
ance of (j) 

{4>{L)HLr)-HL)HLr 

^i{l^NlH)-\ iiL = L'; 

1 0, otherwise 

= Sl-l'2Al 
where the last equality holds by 



III. SHRINKING THE QUADRATIC 
ESTIMATOR WHEN Cf ^ IS KNOWN 



the posterior expected value given the data v l under the 
assumption that both Ql (defined in (|3|) and (l){L) are 
uncorrelated, mean zero and jointly Gaussian. Returning 
to the original characterization of the quadratic estimate 
this leads to the following expression for ^Xj^^L): 



1 



Q{l^L)Q{iYgL{l)di 



when setting \l = 5qC^^ . 



Moreover, the posterior variance of (t){L)^ under this 
Bayesian interpretation, is simply (l'''N^^l + A^^)~^ = 
5o/[(2Ai)-^ + (Cf)-i]. 

The above Bayesian interpretation of {L) relies on 
the assumption that yL,k is Gaussian. This is clearly 
violated since yL,k is a product of two Gaussians. How- 
ever, one can also interpret the ridge regression estimate 
(f)Xj^{L) in the spirit of a Wiener filter of the quadratic 
estimator which makes the Gaussianity assumption less 
worrisome. In particular, if one treats the quadratic es- 
timate (j) as data, the approximate unbiasedness of the 
quadratic estimator allows one to write 



4>{L) = c^{L) ^ e{L) 



(8) 



where (e(L)) ^ and {e{L)e{Vy) = 5l-l'2Al. The 
supposition that e is Gaussian becomes more believable 
since (/) is a weighted average of v/, which are approxi- 
mately independent random variables (up to zero order 
in (j)). If one now applies the Bayesian paradigm un- 
der the assumption that (j) and e are uncorrelated mean 
zero Gaussian random fields, then the posterior expected 
value of (j) when observing cj) becomes 



XL) 



2 At 



(9) 



when setting = Sq 



This agrees with ([7| and is clearly recognized as a Wiener 
filter of the quadratic estimate (j). 



The main reason the regression characterization of the 
quadratic estimator is useful is to easily see how one 
would relax the unbiasedness constraint when the true 
Cff^ is known. Indeed, a natural extension to the gener- 
alized regression estimate is called ridge regression which 
can be written 



.(i)^(ltN£il + A, 



(7) 



where Xl is a ridge parameter which regularizes the in- 
trinsic instability that arrises when I'^'N^^l is small (i.e. 
when the variance of the re-construction is large). When 
^(L) is a realization from a mean zero Gaussian ran- 
dom field and Xl is set to the variance of ^(i^) then the 
ridge regression estimate ^Ax, {L) can be interpreted as 



IV. ADAPTIVE SHRINKING WHEN 
UNKNOWN 



IS 



The previous section derived the optimal Wiener fil- 
ter of the quadratic estimate when the spectral density 
Cff^ is known. In this section we study the scenario that 
Cff^ is not known. The main question is how to de- 
rive a shrinkage factor, as in the Wiener filter (|9|, with- 
out knowledge of . This is derived using a hierar- 
chal Bayesian analysis which establishes that the shrink- 
age factor is a mixture over t he pos sible values of the 



supported by the data. In 



IV A 



we first discuss the 



full Bayesian analysis which requires a prior distribution 
on the unknown spectral density Cff^. To circumvent 
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computational difficulties with such an analysis we then 
recommend a non-informative generalized prior, in IV B[ 
which gives rise our robust adaptive shrinkage estimator 
([T]) presented in the introduction. 

To align our notation with standard statistical theory 
it will be useful to concatenate the values of ^(i^), for 
different frequencies L, into one data vector of length 
n, denoted (j). Define similarly as the vector of (l){L) 
values for the matching frequencies L used to construct 
0. We make the additional assumption that and 
contain only unique elements up to complex conjugation 
(so there are no distinct coordinates with values w and 
z such that z = w or z = w*). Working with the vectors 
4> and 0, instead of the functions (j) and 0, has the addi- 
tional advantage that one can easily extend to the case 
where the adaptive shrinkage is done on separate annuli 
in Fourier spa ce (wh ich is discussed in Remark 1 at the 
end of Section llVB). 



We work under the viewpoint, used to derived equation 
that treats as "data" which are then used to esti- 
mate (j). Translating equation ([9| to our vector notation, 
the Weiner filter is given simply by matrix multiplication 



where the matrices S and A are defined by 



(10) 



A = 



In addition, our notation allows us to clearly write the 
relationship between cj) and in a hierarchal Bayesian 
setting 



0|(/)-Ar((/),S) 
0|A-A/'(O,A) 



(11) 
(12) 



where (f)\A ~ A/'(0,A) means Re{(f)) and lm{(f)) are in- 
dependent Gaussian random vectors with individual dis- 
tributions given by Re{(f)) ~ A/'(0,A/2) and Im((/)) ~ 
AT (0, A/2). 



A. The Bayes Solution 

The Bayesian paradigm is the clearest way to under- 
stand how one adapts ( 10) when A is unknown. Indeed, if 



one is willing to model the uncertainty in A (equivalently 
in Cff^) using a prior probability density 7^ (A), then 



Bayes theorem in conjunction with (|11| and ([12| gives 
a posterior density 7^(0, A|0) ex V {(t)\(t))V {(t)\K)V {K) . 
The posterior density P{(f)^A\<j)) quantifies the joint un- 
certainty in the unobserved cj) and A when observ- 
ing the data 0. Notice that one can marginalize out 
A to obtain a posterior on cj) exclusively, V{cj)\cj)) = 



JV{cj),A\^)dA = JV{cj)\A,^)V{A\^)dA where ^^A cor- 
responds to coordinate- wise area element. Now the pos- 
terior expected value of cj) given the data can be com- 
puted as 



</)P(</)|A, 0) cZ0 r{A\4>)dA 



Weiner filter ([10) 



(13) 



The advantage of (13) is that it clearly exposes how to 
handle the situation when A is unknown: average the 
Wiener filter 0^ over different possibilities for A sup- 
ported by the data (through the posterior V{A\cj))). In 
fact, since 0^ depends on A only through a multiplicative 



factor, (13) simplifies to 



4>AV{A\4>)dA 



A(A + E)-ip(A|^)dA. 



4> (14) 



posterior expected 
shrinkage factor 



Therefore, to account for uncertainty in A when estimat- 
ing cj) simply replace the shrinkage factor A(A + in 
(10) with it's expected value under the posterior distri- 



bution on A, V{A\cj)). 



B. Robust generalized Bayesian adaptive shrinkage 

The Bayesian analysis, while being a complete proba- 
bilistic combination of the data and the prior knowl- 
edge for A, can be time consuming for two reasons. First, 
one needs to translate the knowledge in A to a prior dis- 
tribution 7^ (A). Secondly, getting posterior samples from 
V{A\4>) often requires advanced Monte Carlo techniques 
(similar to those found in [HI [37] for example). For these 
reasons we present a simplified non-informative general- 
ized prior on A, originally developed in [3, 36 , for quick 
exploratory analysis. This prior requires the user to spec- 



ify a fiducial spectral density model, denoted and 
is designed to be both robust against alternative spectral 
truths and computationally simple. 

To specify the prior start by defining the following ma- 
trix based on the fiducial spectral density 

The non-informative generalized prior for A is given by 

A^^(S + Afid)-S (15) 
^ has generalized density ex ^""^ on (p, oo) (16) 

where p is the largest eigenvalue of 11(1; + Afid)""^ (which 
ensures that ^(S + A^d) — S is always positive definite). 
Since the support of ^ contains 1, the prior includes the 
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fiducial model Afid as a possible truth. The generalized 
density is an improper prior (it integrates to infin- 
ity) which can be derived as the well known Jeffereys' 
non- informative prior within the class of distributions 
A ~ ^(S + Afid) — S for the hierarchal parameter ^. 
The theoretical properties of this and similar priors have 
been extensively studied in the statistical literature (see 
[3, 4, 36 ) and has been shown to yield estimators with 
desirable statistical properties. 

One of the advantages of this prior is that marginaliz- 
ing over and A collapses ( pT| , ([l2| and (15) to 



0k-^A/'(O,^(S + Afid)). 



(17) 



Now it is easy to obtain the posterior distribution on 
^. In particular, let 7^(0|^, Afid, S) denote the marginal 
density of 4> in \}J] so that 



7^(0|^,Afid,S)a 



e|S + Afid| 



exp 



(-ll^llid/e 



where ||0|||d is defined by 



liquid 



(S + A 



fidj 



-1/2 



Re(0) 



(S + Afid)-^/M0) 



^ 2 



Note, the number of elements in is n (so that the vector 
{Re{^y ^lm.{^yy has 2n entries, for example). Therefore 
a formal application of Bayes theorem for ^ under the 
model (17) gives 



ocP(^|e,Afid,E)P(0 
^exp(-||0||id/e) 



oc 



(18) 



on (p, oo) where 7^(0 oc ^ ^ denotes the prior for ^. This 
shows that the posterior V{£,\^) has a truncated inverse 
gamma distribution, which can easily be sampled from, 
as in Algorithm [T] below. 

Algorithm 1: Sample from V{^\4>) 
1: Simulate a Gamma random variable ^ with density 

proportional to C^~'^ e^p{—C\\4>\\fid) 
2: if C~^(^ + Afid) — S is positive definite then go to 

step 3, else go back to step 1 
3: return ^ ^ (^"^ 

The posterior samples from V{(,\^) do not have a direct 
physical interpretation. However, Algorithm|2]shows how 
sampling from V{£,\4>) allows easy sampling from V{cj)\4>). 
Actually, it is not immediately obvious that Algorithm 2 
gives samples from V{(t>\4>) since the prior V{^) is im- 
proper. However, a careful application of Fubini shows, 
indeed. Algorithm 2 samples V{4>\cj)). In addition, the 



posterior expected shrinkage factor in (14) can be com- 
puted as follows 

J A(S + A)-i7^(A|0) 

= / [/-r's(s + Afid)-']n^i0)^i^ 

= /-F^S(S + Afid)-' (19) 



where = ^7^(^10) and / denotes the nxn identity 
matrix. Keeping track of the normalization factor in ( 18 ) , 
one obtains the following analytic expression for 



1 



110111, P(n-l,||0|||» 



(20) 



where P(a, x) is the normalized incomplete gamma func- 
tion given by P{a^x) = t^~^e~^dt. Combining 
equations ([l9| and ([l4| one obtains the following adap- 
tive shrinkage estimate of cj) given 4> 



^ = J 0P(0|0)= [J-F<^S(S + Afld)-i]^ 



(21) 



which is recognized as the matrix form of ( [T|). 

Remark 1: In our implementation of (21) we par- 



tition the Fourier frequencies L into concentric annuli 
around the origin and construct the posteriors P(^|0) 
and V{4>\4>) separately on each annuli. In this way we 
obtain distinct shrinkage factor adjustments F^ for each 

annuli, hence the dependence of F^ on L, written F^ in 
([T]). This essentially adds fiexibility by allowing indepen- 
dent priors V{S,) for each annuli. In fact, if 2Al changes 
drastically within an annuli, one may further partition 
with the goal of obtaining partitions within which the 
values of 2Al are similar. 

Remark 2: Although, one has access to an analytic 
expression for the given in i20h this formula be- 
comes numerically unstable when either ||0|||d is small 
or n large. Therefore we recommend approximating 

-^L ~ I ^~^Pi^\4^)^^ '^y averaging samples of ob- 
tained from Algorithm [T] 

Algorithm 2: Sample from V{4>\4>) 

1: Simulate ^ from Algorithm fll 

2: Set A ^ ^(H + Afid) - S and simulate 



>A/'(a(S + A)-^^, p-^+A-^]-^) 



3: return 



Remark 3: To connect the Bayes estimator (21) with 



alternative non-Bayesian estimates of 0, it is instructive 
to consider the limit as n ^ oo. The cleanest connection 
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FIG. 1. Example 1: Pixel space estimates (bottom row) of the filtered lensing potential |L|(/)(L)l|x,|<iooo (top). The lensed 
simulation is done with arcmin pixels, a beam FWHM set to 1 arcmin and additive white noise with a standard deviation of 
25 fiK-aicmin. The quadratic estimate is shown bottom left and the robust Bayes estimate is show bottom right using an 
input fiducial spectral density C^\^ set 4 times smaller than the simulation truth spectral density Cff^ . The bottom middle 

two plots show the Wiener filter of the quadratic estimate given by (joj) based on Cf^\^ (middle left) and the true but unknown 

spectral density C^'^ (middle right). 



occurs when the fiducial model C^\^ is set to (so that 
Afid = 0) which postulates no lensing in the observed 
CMB. Making the additional assumption S = 2(j^I the 
estimate (21) simplifies to = [l — In Appendix 



Iwe show 



[1 - F*] 



(2n - 2)(t2 



+ o(l) (22) 



where o(l) uniformly in as n ^ oo and 



max{0,x}. The right hand side of (22) is the shrinkage 
factor used in the famous James- Stein shrinkage estima- 
tor p!8]. This follows since the real and imaginary parts 
of 4> ctre modeled as 2n independent Gaussian random 
variables, each with variance a^. In fact, for any fixed 
number e > 0, the supremum of |o(l)| over the region 

> + e converges to zero exponentially fast as 
n ^ oo. Therefore if one changes 2n — 2 in the numera- 



2n-2 



tor of (22) to, say 2n, this exponential convergence fails 
to hold. Even more is true: under the same assumptions 
on Afid and S, the results of [2j show that is a min- 
imax and admissible estimate of with respect to the 
quadratic loss when n > 2. Similar results for (21) and 



other Bayes estimates have been derived in the statistical 
literature (see [11|25] for a review of the literature). 



V. SIMULATIONS 

We present three simulation examples which give an 
overview of some of the advantages provided by the ro- 
bust Bayesian procedure developed in Section |IV[ The 
first example demonstrates robustness and compares 
with the optimal Wiener filter. The second compares 
point-wise error quantification with the quadratic esti- 
mator. The final example explores spectral power esti- 
mation and demonstrates the robust Bayesian method 
can generate more accurate detection levels than the 
quadratic version based on spectral density estimation. 



Example 1 

In this example we consider the problem of imaging 
estimated over-densities (in pixel space) of the filtered 
potential. We focus on the filter |L|^(L)l|2^|<iooo where 
l|Zy|<iooo is the top hat indicator with radius 1000. We 
remove the spectral power beyond 1000 for two reasons. 
First, the high frequency power must be removed from 
the quadratic estimate or else the noise completely domi- 
nates the image. Secondly the, so called, n]^^ bias starts 
to significantly contaminate the results beyond 1000 (see 
11Q1122,). 

The bottom row of images in FIG. [l] show four dif- 
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FIG. 2. Example 2: Pixel space line transect plots of the robust Bayes estimate (left, blue) and quadratic estimate (right, 
blue) of the filtered lensing potential |L|(/)(L)l|x,|<iooo plotted in red for both left and right diagrams. The shaded region 
corresponds to 2cr error bars on the right and a point- wise 95% posterior region on the left. The lensed simulation is done with 
arcmin pixels, a beam FWHM set to 4 arcmin and additive white noise with a standard deviation of 5 /ii^-arcmin. The input 
fiducial spectral density Cf^^^ used for the robust Bayes estimate is set 4 times larger than the simulation true spectral density. 



ferent estimates of the filtered lensing potential in pixel 
space based on a lensed CMB simulation in a 17^ x 17^ 
patch of periodic flat sky observed on arcmin pixels. The 
lensed CMB simulations are done using a Gaussian beam 
FWHM of 1 arcmin and additive white noise with stan- 
dard deviation 25 /ii^-arcmin (further simulation details 
can be found in Appendix [A|). The top plot shows the 
simulation truth, zoomed in on an over dense region. 
The bottom left shows the corresponding region for the 
quadratic estimate and the bottom right shows the robust 
Bayesian estimate given in ([T]) where the fiducial spectral 
density Cff'f^^ is 4 times smaller than simulation truth 

spectral density Cf^^. The middle two plots show Wiener 
filters of the quadratic estimate given by ([9| based on the 
true, but unknown spectral density C^^middle right) 
and on the same fiducial spectral density Cff^^^ (middle 
left) used to generate the robust Bayes estimate. 



There are three things to notice here. First, the ro- 
bust Bayes estimate successfully shrinks the noisy high 
frequency terms in the quadratic estimate (i.e. far left is 
noisier than than far right). Secondly, if one simply used 
the Wiener filter ([9| based on the wrong fiducial model 
^L^fid would seriously over shrink the quadratic esti- 
mate (i.e. middle left over shrinks). Three, the adaptivity 

factor is adjusting the robust Bayes estimate to be- 
have more like the optimal Wiener filter ^ when one 
has access to the true spectral density C^'^(i.e. middle 
right and far right look similar) . This demonstrates some 
robustness to mis-specification of the fiducial model. 



Example 2 



The second simulation compares point- wise error quan- 
tification for estimating the same filtered potential 
|L|0(L)l|2^l<iooo as in last example. To contrast with 
the last example, we use a different fiducial spectral den- 
sity: one that is 4 times too large rather than 4 times too 
small. We also change the beam FWHM to 4 arcmin and 
additive noise standard deviation to 5 /iK-arcmin (the 
pixel size and sky coverage is the same as the last ex- 
ample). FIG. [2] shows line transect plots of the filtered 
simulation truth, plotted in red for both left and right 
diagrams. The right diagram shows the quadratic esti- 
mate (blue) with 2<j error regions shaded grey. The 2(j 
region is computed using the Fourier space variance ap- 
proximation 2|LpA/,l|2^|<iooo foi" the filtered quadratic 
estimate. The left diagram shows the robust Bayesian es- 
timate (blue) with the shading region denoting a 95% the 
posterior region (point-wise) which is determined from 
simulation. We also included a plot of one posterior pos- 
terior sample (dashed). The point- wise error bars cor- 
responding to the Bayes estimate are somewhat smaller 
than those from the quadratic estimator. However, the 
main feature of these plots is that, even though the fidu- 
cial model is 4 times too large, the Bayes estimate suc- 
cessfully shrinks the noisy high frequency terms in the 
quadratic estimator. One can also see the advantage of 
having posterior realizations of possible truths supported 
by the data (dashed) which allow joint quantification of 
uncertainty rather than simple point- wise mean and stan- 
dard error bars provided by the quadratic estimate. 
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Quadratic estimator 




FIG. 3. Example 3: The left plot illustrates spectral power estimation for the convergence k{L) using the quadratic estimate 
(squares) and the robust Bayesian method (circles and stars) for two different input fiducial models (dashed lines). The error 
bars for the Bayesian estimates correspond to 2.5 and 97.5 posterior percentiles, whereas the quadratic estimate is displayed 
with 2(7 error bars (see Section |v] for simulation details). The histograms, at right, show the quadratic estimates and the Bayes 
estimates of the spectral power in k{L) at wavelength L = 200 based on 1000 independent lensing simulations. The dashed red 
vertical lines in the histogram plots designate the value of the true spectral density C2oo- See Section |V| for further details. 



Example 3 



In this example we explore spectral power detec- 
tion and demonstrate that the robust Bayesian method 
can generate more accurate detection levels than the 
quadratic version based on spectral density estimation. 
CMB gravitational lensing surveys often focus on de- 
tecting non-zero spectral density power in the so called 
convergence field k{x) = —V'^(j){x)/2 (using the nota- 
tion found in [38]) which is a tracer of projected mass. 
Typically, the quadratic estimate is used to estimate the 
spectral density C'l'^ which, along with standard error 
bars, are then used to report a-detection levels. However, 
when estimating the spectral density there is always un- 
avoidable cosmic variance, which limits the a-detection 
level achievable. To illustrate this fact, consider the case 
that one is able to observe the n{x) field directly, with 
no noise. In this case there is no uncertainty in lensing 
detection. However, there is still cosmic variance uncer- 
tainty for estimating the spectral density . We ex- 
plore this issue by taking advantage of the robust Bayes 
method which easily produces estimates of the spectral 
power in n directly, instead of through C'l'^. We demon- 
strate better a-detection levels for gravitational lensing 
than would be obtained through quadratic estimates of 

The parameter of interest, in this case, is the spectral 
power in n. Under the isotropic assumption on n it is 
natural to radially average the spectral power over con- 
centric annuli about the origin. We only consider annuli 
which are cut in half since i^{—L) = n{LY ^ because i<i{x) 
is a real random field, which implies redundant informa- 



tion on the other half of the annuli. In particular, for 
each half annuli we want to estimate the quantity 



(23) 



where AL denotes the area of the grid spacing in Fourier 
space arising from finite sky observations, and de- 
notes the number of observed Fourier frequencies in A. 
Notice the factor AL makes (23) an unbiased estimate 



of when the annuli radii and AL are infinitesi- 



mally small (this follows by the equality {n{L)i<i{L'Y 



L-L'^L where denotes expectation with respect to 



The Bayesian estimate for (23) is exceedingly easy to 



construct: simply average the quantity (23) computed 
on each posterior sample (j) obtained from Algorithm [2j 
To compare with the quadratic estimate of the spectral 
density which is typically used for detection, we use 
the following unbiased (up to leading order) quadratic 
estimate of C'l'^ 



AL ^ 



\kiL)\' 



1 



J2 \L\'Ar^/2 (24) 



LeA 



where k{x) = —V'^(j){x)/2 and (j) is the quadratic esti- 
mate. Under the Gaussian approximation one gets the 
following variance for 



var C'l'^ 



1 



(25) 



This approximation agrees with [22] and [10] (in the 
curved sky) when 2Al is rotationally symmetric (the 
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missing factor of 2 found in [22^ and [TO^ appears since 
our A is only half an annuls). Notice that to compute 
the variance of C^'^ one must know the unknown spec- 
tral density To approximate varC£'^ we therefore 
replace in the right hand side of equation (25) with 



the corresponding error bars for the quadratic estimate. 
This is due to the fact that the Bayes estimate does not 
contain cosmic variance uncertainty between (23) and 



the estimate 

Our first comparison of the Bayes and quadratic es- 
timates of (23) is shown in the left plot of FIG. [s] 
and is based on one realization of a lensed CMB field 
X W periodic sky, arcmin pixels, beam FWHM = 

1 arcmin, white 25 /ii^-arcmin noise) with spectral den- 
sity shown in red. We computed two Bayes esti- 
mates: one which uses a fiducial model Cff'f^^ which is 2 
times too large (estimates are denoted with circles, fidu- 
cial model with dashed line), and one using Cff'f^^ which is 

2 times too large (estimates are denoted with stars, fidu- 
cial model with dashed line). The estimates are based 
on the first 5 concentric half annuli (width = 50) about 
the origin. The attached error bars correspond to 2.5*^ 
and 97.5*^ posterior percentiles. The squares show the 
quadratic estimate (24) with 2a error bars based on ([25|). 



There are two points here. First, the Bayesian error bars 
are smaller and more naturally handle the positivity con- 
straint. Secondly, the Bayesian regions are relatively ro- 
bust with respect to misspecification of Cf^\^ (i.e. the 
error bars attached to the circles and the stars are about 
the same size). 

For our second comparison we check the posterior cov- 
erage probabilities, the quadratic estimate coverage prob- 
abilities and the relative sizes of the quadratic and Bayes 
error bars. We simulated 1000 independent lensed CMB 
realizations (with the same simulation configuration as 
above) using independent realizations of the unlensed 
CMB, the noise and the lensing potential for each simula- 
tion. The right hand plot of FIG.[3]shows the histograms 
of the corresponding quadratic and Bayes estimates of 
( [23| ) at frequency \L\ = 200 (with half annuli width set 
to 50) . The Bayes estimates were generated using a fidu- 
cial spectral density which is 2 times smaller than the 
input spectral density used for simulation. The dashed 
red vertical lines designate the value of the true spectral 
density C^q'o- 

The two histograms in FIG. [3] are similar with a slight 
variance and bias reduction in the Bayes estimates (15% 
smaller variance and 50% smaller bias). However, the 
main difference can be found in the associated error bars. 
For the Bayes estimate we use a posterior region based on 
2.5*^ and 97.5*^ posterior percentiles for estimating (23), 



not C200' Surprisingly, exactly 950 of the 1000 posterior 
regions contained the true value of (23), which is different 



in each simulation. Conversely, 934 out of the 1000 2a- 
error bars associated with the quadratic estimate covered 
the true value of C2oo- One would expect typical Monte 
Carlo fiuctuations of about ±7 (if true coverage was 95%) 
so this gives moderate evidence that the quadratic esti- 
mate error bars slightly undercover the truth. Moreover, 
the width of the posterior regions are 15% smaller than 



200- 

It should be noted that it is possible to adjust the vari- 
ance calculation (25) to represent the uncertainty for esti- 
mating (23), rather than the uncertainty for This is 



somewhat beside the point, however, that the Bayesian 
estimates do this naturally no matter what functional 
one is estimating. There still needs to be a verifica- 
tion process that the Bayesian method behaves appro- 
priately, however, for any rigorous scientific application. 
This most likely would include some type of simulation 
study. It may also include, as is done in Remark 3 of 
Section |IVB[ a derivation of the frequentest properties 
of the Bayesian estimates. 



VI. DISCUSSION 

The unbiasedness constraint in the quadratic estimator 
forces large variability in the estimated gravitational po- 
tential, especially at high frequency. This can present 
difficulties for exploratory data analysis, mapping the 
lensing potential and estimating nonlinear functionals of 
the gravitational potential, for example. In this paper 
we study the potential advantages obtained by relaxing 
the unbiasedness constraint in the quadratic estimator. 
To accomplish this we derived a regression characteri- 
zation of the quadratic estimate which then allows one 
to clearly see how bias can be introduced for a reduc- 
tion of variance: through ridge regression and Bayesian 
techniques. The Bayesian framework is especially cogent, 
essentially treating the quadratic estimate as data while 
incorporating prior information on Cff^. The resulting 
estimate is an adaptive Wiener filter adjustment to the 
raw quadratic estimate — shrinking frequencies with small 
SNR and retaining those with high SNR. As an alterna- 
tive to a full Bayesian analysis, which can be somewhat 
demanding, we present a non-informative prior which not 
only leads to estimates with desirable frequentist prop- 
erties (such as an asymptotic James-Stein shrinkage be- 
havior) but also yields a posterior distribution which is 
easy to simulate without resorting to Monte Carlo tech- 
niques. The non-informative prior requires the user to 
input a fiducial model, but is designed to be robust to 
misspecification of this input model. 

One clear advantage of the Bayesian analysis is the 
availability of posterior samples to construct estimates 
and uncertainty quantification for any non-linear func- 
tion of the gravitational potential, including spectral den- 
sity estimation. Indeed, rather than using the shrinkage 
formula ([T]) directly (it becomes numerically unstable) 
we recommend averaging samples from the posterior ob- 
tained from algorithms [1] and [2] given in Section pVB In 
Section |V| we explored the advantages gained by having 
easily obtained posterior samples: estimating and quan- 
tifying uncertainty in functionals of the gravitational po- 
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tential and joint quantification of uncertainty. 

A drawback of the above Bayesian analysis — indeed of 
the quadratic estimate itself — is the bias obtained from 
the Taylor truncation on the lensed CMB used to derive 
the quadratic estimate. The spectral properties of this 
bias is relatively well understood [lOl [22] when marginal- 
izing over the randomness inherent in the large scale 
structure. However, it is unclear how this bias effects 
other features of the estimate of (j). Therefore, when ap- 
plying the Bayesian methods described in the paper, we 
recommend avoiding the frequencies which are shown to 
be contaminated by biases. An alternative to avoiding 
frequencies is adjust 2Al to include the so called N^^ 
and N)^ biases. However, until we have a good under- 
standing of the sensitivity of N^^^ and N^^^ to a fiducial 
model this remains an unexplored possibility. 



Appendix A: Simulation Details 

The fiducial cosmology used in our simulations is 
based on a flat, power law ACDM cosmological model, 
with baryon density = 0.044; cold dark matter 
density l^cdm = 0.21; cosmological constant density 
Vt\ = 0.74; Hubble parameter h = 0.71 in units of 
lOOkms"^ Mpc~^; primordial scalar fluctuation ampli- 
tude As{k = 0.002 Mpc-^) = 2.45 x 10"^; scalar spectral 
index ns{k = 0.002 Mpc~^) = 0.96; primordial helium 
abundance Yp = 0.24; and reionization optical depth 
Tr = 0.088. The CAMB code is used to generate the 
theoretical power spectra [26 . 

To construct the lensed CMB simulation used in this 
paper we first generate a high resolution simulation of 
Q{x) and the gravitational potential (j){x) on a periodic 
17° X 17° patch of the flat sky with 0.25 arcmin pixels. 
The lensing operation is performed by taking the numer- 
ical gradient of ^, then using linear interpolation to ob- 
tain the lensed field Q{x -\-\/(j){x)). We down-sample the 
lensed field, every 4*^ pixel to obtain the desired arcmin 
pixel resolution for the simulation output. A Gaussian 
beam is then applied in Fourier space using FFT of the 
lensed fields. Finally white noise is added in pixel space. 



Appendix B: Derivation of equation (22) 



Start by letting 5 = ||0|||^/(n — 1) which simplifies to 

a^2n-2) ^hcu Afid = and S = 2a^I. Notice that 's' be- 
haves like a ratio between the observed signal power and 
the nominal noise level. To make the following calcula- 
tions more readable we also let m = n — 1. To establish 



( 22 ) it will be sufficient to show 



1 P (m + 1, ms) 
s P{m^ms) 



■0(1) 



(Bl) 



where o(l) is a function of both m and s such that 

sup|o(l)|^0 as m ^ oo. (B2) 

s>0 



Notice that 1 



This 



1 P(m-\-l,ms) • • • • 

- p/ N IS mcreasmg m 

S y [771,1713 ) ^ 

was shown in Lemma 2.1.1(vii) of [3] where it is noted 
that the density for the random variable has a de- 
creasing monotone likelihood ratio in s. This is sufficient 
for stochastic domination (see Lemma 3.4.2 in [24 , for 
example) which implies the expected value = 

/ ^"^^(^10) = I ^p7^,!C)^^ go^^ ^^^^ ^^^^ ^ S^^^ ^p- 
Therefore the supremum supQ^g<;L 1^(1)1 attained at 
5 = 1. This follows by the fact that 1 — l^^^i±lill^ 

^ s Fym^ms) 

increasing and positive (because it equals 1 — and 
is supported in (0, 1]). Therefore to prove ( |B2[ ) it will 
be sufficient to establish sup5>]^ \o(X)\ ~^ since 

sup |o(l)| <sup|o(l)|. 

0<s<l s>l 

To study the case 5 > 1, integration by parts gives 



^ lP{m-\-l,ms) ^ 
s P{m,ms) 



, ms 



1 P{m^ 1 
s 
1 

s ml sP(m^ms) 



P[m^ ms) 



Therefore 



sup |o(l) 

S>1 



< 



(m/e)^ (56^-^)^ 

^ — — sup — ^ 

ml s>i sPym^ms) 

(m/e)^ 



m!P(m, m) 
2 ^0 



nm 



(B3) 
(B4) 



Line (B3) follows since se^~^ < 1 (with equality only 
when 5 = 1) and the fact that P{m^x) is an integral of 
a positive function over the interval (0,x]. Line (B4) 
follows by Stirling's approximation and the fact that 
P(m,m) = P^[X]feLi-^fc ^ ^] where Xk are indepen- 
dent Gamma random variables with mean and variance 
1. The central limit theorem then says P(m, m) = 
Pr[v^ ELi(^/c - 1) < 0] ^ Pr[Z < 0] = 1/2 where Z 
is a standard normal random variable. 



Appendix C: FFT and the quadratic estimator 

Unfortunately the regression format is not amenable 
to computation. Therefore one must take advantage of 
the Fourier filtering characterization of the quadratic es- 
timator to implement it on the computer. There are a 
few details that we mention here. The nominal form of 
the quadratic estimator is 

^{L) =Al [ e{i + L)e{iygL{i) di 
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where the weights g must satisfy the constraint 
9L{^)fL{^) > and the normahzing constant A^^ = 
/ 9L{^)fL{^)d^ ensure unbiasedness. In [16] the op- 
timal weights g are derived to be proportional to 
fL{iy/[C2^L,e.ptCfZpt]' However there may be cases 
where some pairs of frequencies {i^i + L) need to be 
excluded. For example, when O is approximated us- 
ing a discrete FFT one wants to avoid using the fre- 
quencies that have a large amount of aliasing: when 
\ii\ > 27r/(2Axi) or 1^2 1 > 27r/(2Ax2) (the Nyquist lim- 
its, where Axi is the grid spacing in the first coordinate 
position space). In addition we want to avoid using 0(0) 
(since it contains no information on lensing). To handle 
this we introduce a masking function M{i) and absorb 
it into the optimal g 

'^^+Zy ,expt '^£,expt 

In our case we use the following mask 



M{i) 



£^0 



to avoid aliasing errors (we cut at half the Nyquist) and 
the frequency. This masking slightly alters the gradient 
filtering characterization of the quadratic estimate found 
in [16^ . Under the assumption that the masking function 
satisfies M{i) = M{—£) one can derive following charac- 
terization of the quadratic estimator: 



(l){L) = -2iAL L ■ / e-''^ ''G{x)W{x) 



dx 

^2^ 



(CI) 



where G{i) 



iicf'^if{iyM{i) 

^£,expt 



e(£) and W{i) 



^£,expt 

Since the masking function is not radially symmetric, 
the normalization factor Al will not be either. Since a 
nominal Riemann approximation to the integral is com- 
putational intensive (a two dimensional integral is re- 
quired for each L) we derive a Fourier representation 
similar to (CI) which can utilize a Fast Fourier Trans- 
form. The normalizing factor becomes 



E 

k,j=\ 



^ f e-^^-^[Aix)CkAx) - Bkix)B,{x)]^ 



where 



A{£) ^ I^WJMW , Bk{£) ^ i4C««A(£) and 

^£,expt 



Ck,j{^) = WCf^)^A{i) for kj = 1,2. 
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